Search | WHO COVID-19 Research Database

A Novel Scalable Feature Extraction Approach for COVID-19 Protein Sequences and their Cluster Analysis with Kernelized Fuzzy Algorithm

Jha, P.; Tiwari, A.; Bharill, N.; Ratnaparkhe, M.; Patel, O. P.; Harshith, N.; Solasa, S. L..

2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022 ; : 56-59, 2022.

Article in English | Scopus | ID: covidwho-1788619

ABSTRACT

COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, was declared a pandemic by the World Health Organization on March 11, 2020. To solve the global problem of analysis of different variants of COVID-19 genome sequences, there is a need to develop intel-ligent, scalable machine learning techniques that can process and analyze important COVID-19 protein data by utilizing the Big Data framework. For this, we have first proposed a feature extraction approach for COVID-19 protein data named Scalable Distributed Co-occurrence-based Probability-Specific Feature extraction approach (SDCPSF). The proposed SDCPSF approach is executed on the Apache Spark cluster to preprocess the massive COVID-19 protein sequences. The proposed SDCPSF represents each variable-length COVID-19 protein sequence with fixed length six dimensions numeric feature vectors. Then the extracted features are used as input to the kernelized fuzzy clustering algorithms, i.e., KSRSIO-FCM and KSLFCM, which efficiently performs clustering of big data due to its in-memory cluster computing technique and thus forms clusters of COVID-19 genome sequences. Furthermore, the performance of KSRSIO-FCM is compared with another scalable clustering algorithm, i.e., KSLFCM, in terms of the Silhouette index (SI) and Davies-Bouldin index (DBI). © 2022 IEEE.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL